A measure of phonetic similarity to quantify pronunciation variation by using ASR technology
نویسندگان
چکیده
It attracts researchers’ interest how to define a quantitative measure of phonetic similarity between IPA transcripts of the same sentence read by two speakers. This problem can be divided into how to align two transcripts and how to quantify alignment gap. In this paper, we introduce a method of similarity calculation using phone-based or phoneme-based acoustic models trained with the algorithm used to develop Automatic Speech Recognition (ASR) systems. Use of acoustic models will introduce an issue of speaker dependency because speech spectrums always convey the information of the training speakers’ age and gender, which is totally irrelevant to phonetic similarity calculation. We examine how independent our method is of training speakers and how close the calculated similarity is to the similarity subjectively rated through a listening test. We also compare our method to recent works and show our method can give higher correlation by 4 points to human-rated similarity.
منابع مشابه
Using an underspecified ASR system as an indicator for phonetic similarity
This paper presents a novel approach to the identification of phonetic similarity using properties observed during the speech recognition process. An experiment is presented whereby specific phones are removed during the training phase of a statistical speech recognition system so that the behaviour of the system can be analysed to see which alternative phone is selected. The domain of the anal...
متن کاملUsing accent information in ASR models for Swedish
A common technique to cope with the large variability in the acoustic realisations of the phonetic classes in speech, is to partition the data according to a linguistically significant variable. In this work, accent dependent phonetic models were trained and used both as an analysis tool for pronunciation variation and in the attempt to improve ASR performance. The Idea Accent dependent trainin...
متن کاملA study of implicit and explicit modeling of coarticulation and pronunciation variation
In this paper, we focus on the modeling of coarticulation and pronunciation variation in Automatic Speech Recognition systems (ASR). Most ASR systems explicitly describe these production phenomena through context-dependent phoneme models and multiple pronunciation lexicons. Here, we explore the potential benefit of using feature spaces covering longer time segments in terms of implicit modeling...
متن کاملUnderspecified Feature Models for Pronunciation Variation in Asr
In the 1990s, several studies showed that if we could just predict correctly when to include alternate pronunciations of words in ASR lexica, we could greatly reduce error rates for conversational speech tasks (i.e., Switchboard). But it is clear that the field has thus far failed to reach that potential. Many scholars model pronunciation variation via a substitution of one phonetic sequence fo...
متن کاملIncorporating Contextual Phonetics into Automatic Speech Recognition
This work outlines the problems encountered in modeling pronunciation for automatic speech recognition (ASR) of spontaneous (American) English speech. We detail some of the phonetic phenomena within the Switchboard corpus that make the recognition of this speaking style difficult. Phonetic transcribers found that feature spreading and cue trading made identification of phonetic segmental bounda...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015